Evaluation and Adaptation of the Celex Dutch Morphological Database
نویسندگان
چکیده
This paper describes some important modifications to the Celex morphological database in the context of the FLaVoR project. FLaVoR aims to develop a novel modular framework for speech recognition, enabling the integration of complex linguistic knowledge sources, such as a morphological model. Morphology is a fairly unexploited linguistic information source speech recognizers could benefit from. This is especially true for languages which allow for a rich set of morphological operations, such as our target language Dutch. In this paper we focus on the exploitation of the Celex Dutch morphological database as the information source underlying two different morphological analyzers being developed within the project. Although the Celex database provides a valuable source of morphological information for Dutch, many modifications were necessary before it could be practically applied. We identify major problems, discuss the implemented solutions and finally experimentally evaluate the effect of our modifications to the database.
منابع مشابه
A mixed word / morphological approach for extending CELEX for high coverage on contemporary large corpora
This paper describes an alternative approach to morphological language modeling, which incorporates constraints on the morphological production of new words. This is done by applying the constraints as a preprocessing step in which only one morphological production rule can be applied to an extended lexicon of known morphemes, lemmas and word forms. This approach is used to extend the CELEX Dut...
متن کاملA Comparison of Two Different Approaches to Morphological Analysis of Dutch
This paper compares two systems for computational morphological analysis of Dutch. Both systems have been independently designed as separate modules in the context of the FLaVoR project, which aims to develop a modular architecture for automatic speech recognition. The systems are trained and tested on the same Dutch morphological database (CELEX), and can thus be objectively compared as morpho...
متن کاملKnowledge-Free Induction of Inflectional Morphologies
We propose an algorithm to automatically induce the morphology of inflectional languages using only text corpora and no human input. Our algorithm combines cues from orthography, semantics, and syntactic distributions to induce morphological relationships in German, Dutch, and English. Using CELEX as a gold standard for evaluation, we show our algorithm to be an improvement over any knowledge-f...
متن کاملSpelling space: A computational testbed for phonological and morphological changes in Dutch spelling
The Dutch spelling system, like other European spelling systems, represents a certain balance between preserving the spelling of morphemes (the morphological principle) and obeying letter-to-sound regularities (the phonological principle). We present experimental results with artificial learners that show a competition effect between the two principles: adhering more to one principle leads to m...
متن کاملRefurbishing a Morphological Database for German
The CELEX database is one of the standard lexical resources for German. It yields a wealth of data especially for phonological and morphological applications. The morphological part comprises deep-structure morphological analyses of German. However, as it was developed in the Nineties, both encoding and spelling are outdated. About one fifth of over 50,000 datasets contain umlauts and signs suc...
متن کامل